FARMER: Finding Interesting Rule Groups in Biological Datasets

نویسندگان

  • Gao Cong
  • Xin Xu
  • Feng Pan
  • Anthony K. H. Tung
چکیده

The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain up to 10,000100,000 columns but only 100-1000 rows. Association rules can reveal biological relevant associations between genes and environmental/categories to identify gene regulation pathways. However, most existing association rule mining algorithms have an exponential dependence on the number of columns. Moreover, the number of association rules generated from bioinformatic datasets are enormous due to the combinatorial explosion of frequent itemsets. In this paper, we describe a new algorithm called FARMER that is specially designed to discover interesting rule groups by identifying their upper bounds and lower bounds from biological datasets. FARMER exploits all user specified constraints including minimum support, minimum confidence and minimum chi-square to support efficient pruning. Several experiments on real bioinformatics datasets show that FARMER is orders of magnitude better than previous association rule mining algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Mining and Analysis of Gene Expression Data

Association rules can reveal biological relevant relationship between genes and environments / categories. However, most existing association rule mining algorithms are rendered impractical on gene expression data, which typically contains thousands or tens of thousands of columns (gene expression levels), but only tens of rows (samples). The main problem is that these algorithms have an expone...

متن کامل

Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm

Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...

متن کامل

Regional Association Rule Mining

This project [4] centers on regional association rule mining and scoping in spatial datasets. We introduces a methodology for mining spatial association rules and proposes new algorithms to determine the scope of a spatial association rule. We develop a reward-based region discovery framework that employs clustering to find interesting regions. The framework is applied to solve two distinct reg...

متن کامل

Efficient Association Rule Mining Using Improved Apriori Algorithm

Association rule mining is a data mining technique to extract interesting relationships from large datasets [1, 2]. The efficiency of association rule mining algorithms has been a challenging research area in the domain of data mining [3]. Frequent pattern discovery, the task of finding sets of items that frequently occur together in a dataset is the most resource consuming phase of the rule mi...

متن کامل

Smart Drill Down

We present smart drill-down, an operator for interactively exploring a relational table to discover and summarize “interesting” groups of tuples. Each group of tuples is described by a rule. For instance, the rule (a, b, ?, 1000) tells us that there are a thousand tuples with value a in the first column and b in the second column (and any value in the third column). Smart drill-down presents an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003